high score
Checklist
Do the main claims made in the abstract and introduction accurately reflect the paper's Did you describe the limitations of your work? Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experi-20 Did you include the total amount of compute and the type of resources used (e.g., type If your work uses existing assets, did you cite the creators? Did you mention the license of the assets? Did you include any new assets either in the supplemental material or as a URL? [Y es] Did you discuss whether and how consent was obtained from people whose data you're We thereby state that we bear all responsibility in case of violation of rights, etc., and confirmation of F or what purpose was the dataset created? - For the novel task of data analysis as explained Who created the dataset and on behalf of which entity? - This dataset is created during a Who funded the creation of the dataset? What do the instances that comprise the dataset represent?
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Banking & Finance (0.96)
- Health & Medicine (0.94)
- North America > United States > District of Columbia > Washington (0.04)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- (2 more...)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Banking & Finance (0.96)
- Health & Medicine (0.94)
- North America > United States > District of Columbia > Washington (0.04)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- (2 more...)
Sycophancy as compositions of Atomic Psychometric Traits
Jain, Shreyans, Yost, Alexandra, Abdullah, Amirali
Sycophancy is a key behavioral risk in LLMs, yet is often treated as an isolated failure mode that occurs via a single causal mechanism. We instead propose modeling it as geometric and causal compositions of psychometric traits such as emotionality, openness, and agreeableness - similar to factor decomposition in psychometrics. Using Contrastive Activation Addition (CAA), we map activation directions to these factors and study how different combinations may give rise to sycophancy (e.g., high extraversion combined with low conscientiousness). This perspective allows for interpretable and compositional vector-based interventions like addition, subtraction and projection; that may be used to mitigate safety-critical behaviors in LLMs.
We thank R2 and R3 for their vote of confidence and giving this work at high score of 9 and 8 respectively
We thank all our reviewers for their feedback! We will respond to (R2, R3) separately to R1 due to different concerns. We thank R2 and R3 for their vote of confidence and giving this work at high score of 9 and 8 respectively. It means a lot to us - to see our ideas accepted by our peers at NeurIPS who also believe that our "work opens many new We experimented with setting all weights to a single fixed value, e.g. However, if we then nudge that value by a small amount, to say 0.6, the network fails completely at the In fact, the best performing values were outside of this training set. We will cite and discuss this work in our revised paper. NeurIPS2019 will discuss similar themes and we are excited to see more ideas in this direction from both communities. We agree with R3 that scaling up is the next step. Stanley2009) to scale W ANNs architectures to scales able to compete on benchmarks such as ImageNET and Atari. We wish to take the time to conduct this investigation thoroughly, and plan to report the findings in a follow up paper. W ANNs. We would also like to thank R3 for the other minor suggestions, we will clarify the labels and information. In the spirit of this extreme experiment the algorithm used was purposefully kept simple. Our original intention was to focus only on continuous-control RL experiments, and decided to run MNIST "for fun" We could have confined the paper to only RL experiments (most RL papers don't run MNIST Finally, we do believe there is a connection to the neuroscience field. "What Artificial Neural Networks can Learn from Animal Brains" (Zador2019) whose central theme is that "The first
Selective Matching Losses -- Not All Scores Are Created Equal
Shamir, Gil I., Warmuth, Manfred K.
Learning systems match predicted scores to observations over some domain. Often, it is critical to produce accurate predictions in some subset (or region) of the domain, yet less important to accurately predict in other regions. We construct selective matching loss functions by design of increasing link functions over score domains. A matching loss is an integral over the link. A link defines loss sensitivity as function of the score, emphasizing high slope high sensitivity regions over flat ones. Loss asymmetry drives a model and resolves its underspecification to predict better in high sensitivity regions where it is more important, and to distinguish between high and low importance regions. A large variety of selective scalar losses can be designed with scaled and shifted Sigmoid and hyperbolic sine links. Their properties, however, do not extend to multi-class. Applying them per dimension lacks ranking sensitivity that assigns importance according to class score ranking. Utilizing composite Softmax functions, we develop a framework for multidimensional selective losses. We overcome limitations of the standard Softmax function, that is good for classification, but not for distinction between adjacent scores. Selective losses have substantial advantage over traditional losses in applications with more important score regions, including dwell-time prediction, retrieval, ranking with either pointwise, contrastive pairwise, or listwise losses, distillation problems, and fine-tuning alignment of Large Language Models (LLMs).
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
MEQA: A Meta-Evaluation Framework for Question & Answer LLM Benchmarks
Veuthey, Jaime Raldua, Majid, Zainab Ali, Hariharan, Suhas, Haimes, Jacob
As Large Language Models (LLMs) advance, their potential for widespread societal impact grows simultaneously. Hence, rigorous LLM evaluations are both a technical necessity and social imperative. While numerous evaluation benchmarks have been developed, there remains a critical gap in meta-evaluation: effectively assessing benchmarks' quality. We propose MEQA, a framework for the meta-evaluation of question and answer (QA) benchmarks, to provide standardized assessments, quantifiable scores, and enable meaningful intra-benchmark comparisons. We demonstrate this approach on cybersecurity benchmarks, using human and LLM evaluators, highlighting the benchmarks' strengths and weaknesses. We motivate our choice of test domain by AI models' dual nature as powerful defensive tools and security threats.
Exploring the Potential of Large Language Models to Simulate Personality
Molchanova, Maria, Mikhailova, Anna, Korzanova, Anna, Ostyakova, Lidiia, Dolidze, Alexandra
With the advancement of large language models (LLMs), the focus in Conversational AI has shifted from merely generating coherent and relevant responses to tackling more complex challenges, such as personalizing dialogue systems. In an effort to enhance user engagement, chatbots are often designed to mimic human behaviour, responding within a defined emotional spectrum and aligning to a set of values. In this paper, we aim to simulate personal traits according to the Big Five model with the use of LLMs. Our research showed that generating personality-related texts is still a challenging task for the models. As a result, we present a dataset of generated texts with the predefined Big Five characteristics and provide an analytical framework for testing LLMs on a simulation of personality skills.
- Asia > Russia (0.15)
- North America > United States > New York (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- (2 more...)